{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cattle Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the geostates package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`geostates` can be used to create choropleth plots of the United States or individual states. It is easy to use\n",
    "so we will start out with an example to show you some of the ins and outs of the package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cattle analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Goal:** To illustrate the power of the package, we will start out by creating a plot that shows how the number of cattle varies by state in the United States."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will start by importing the `pandas` and `geostates` packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading in the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example, we use data on US cattle from the [United States Department of Agriculture National Agricultural\n",
    "Statistics Service](https://quickstats.nass.usda.gov). The CSV includes the total number of cattle (including calves) in the United States as of January 2022 broken down by each state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read in the data\n",
    "cattle_data = pd.read_csv('Desktop/cattle_data_22.csv', index_col='State', thousands=',')\n",
    "cattle_data.index = cattle_data.index.str.title()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Program</th>\n",
       "      <th>Year</th>\n",
       "      <th>Period</th>\n",
       "      <th>Week Ending</th>\n",
       "      <th>Geo Level</th>\n",
       "      <th>State ANSI</th>\n",
       "      <th>Ag District</th>\n",
       "      <th>Ag District Code</th>\n",
       "      <th>County</th>\n",
       "      <th>County ANSI</th>\n",
       "      <th>Zip Code</th>\n",
       "      <th>Region</th>\n",
       "      <th>watershed_code</th>\n",
       "      <th>Watershed</th>\n",
       "      <th>Commodity</th>\n",
       "      <th>Data Item</th>\n",
       "      <th>Domain</th>\n",
       "      <th>Domain Category</th>\n",
       "      <th>Value</th>\n",
       "      <th>CV (%)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>State</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Alabama</th>\n",
       "      <td>SURVEY</td>\n",
       "      <td>2022</td>\n",
       "      <td>FIRST OF JAN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>STATE</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CATTLE</td>\n",
       "      <td>CATTLE, INCL CALVES - INVENTORY</td>\n",
       "      <td>TOTAL</td>\n",
       "      <td>NOT SPECIFIED</td>\n",
       "      <td>1260000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alaska</th>\n",
       "      <td>SURVEY</td>\n",
       "      <td>2022</td>\n",
       "      <td>FIRST OF JAN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>STATE</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CATTLE</td>\n",
       "      <td>CATTLE, INCL CALVES - INVENTORY</td>\n",
       "      <td>TOTAL</td>\n",
       "      <td>NOT SPECIFIED</td>\n",
       "      <td>18000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arizona</th>\n",
       "      <td>SURVEY</td>\n",
       "      <td>2022</td>\n",
       "      <td>FIRST OF JAN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>STATE</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CATTLE</td>\n",
       "      <td>CATTLE, INCL CALVES - INVENTORY</td>\n",
       "      <td>TOTAL</td>\n",
       "      <td>NOT SPECIFIED</td>\n",
       "      <td>960000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arkansas</th>\n",
       "      <td>SURVEY</td>\n",
       "      <td>2022</td>\n",
       "      <td>FIRST OF JAN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>STATE</td>\n",
       "      <td>5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CATTLE</td>\n",
       "      <td>CATTLE, INCL CALVES - INVENTORY</td>\n",
       "      <td>TOTAL</td>\n",
       "      <td>NOT SPECIFIED</td>\n",
       "      <td>1690000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>SURVEY</td>\n",
       "      <td>2022</td>\n",
       "      <td>FIRST OF JAN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>STATE</td>\n",
       "      <td>6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CATTLE</td>\n",
       "      <td>CATTLE, INCL CALVES - INVENTORY</td>\n",
       "      <td>TOTAL</td>\n",
       "      <td>NOT SPECIFIED</td>\n",
       "      <td>5200000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Program  Year        Period  Week Ending Geo Level  State ANSI  \\\n",
       "State                                                                       \n",
       "Alabama     SURVEY  2022  FIRST OF JAN          NaN     STATE           1   \n",
       "Alaska      SURVEY  2022  FIRST OF JAN          NaN     STATE           2   \n",
       "Arizona     SURVEY  2022  FIRST OF JAN          NaN     STATE           4   \n",
       "Arkansas    SURVEY  2022  FIRST OF JAN          NaN     STATE           5   \n",
       "California  SURVEY  2022  FIRST OF JAN          NaN     STATE           6   \n",
       "\n",
       "            Ag District  Ag District Code  County  County ANSI  Zip Code  \\\n",
       "State                                                                      \n",
       "Alabama             NaN               NaN     NaN          NaN       NaN   \n",
       "Alaska              NaN               NaN     NaN          NaN       NaN   \n",
       "Arizona             NaN               NaN     NaN          NaN       NaN   \n",
       "Arkansas            NaN               NaN     NaN          NaN       NaN   \n",
       "California          NaN               NaN     NaN          NaN       NaN   \n",
       "\n",
       "            Region  watershed_code  Watershed Commodity  \\\n",
       "State                                                     \n",
       "Alabama        NaN               0        NaN    CATTLE   \n",
       "Alaska         NaN               0        NaN    CATTLE   \n",
       "Arizona        NaN               0        NaN    CATTLE   \n",
       "Arkansas       NaN               0        NaN    CATTLE   \n",
       "California     NaN               0        NaN    CATTLE   \n",
       "\n",
       "                                  Data Item Domain Domain Category    Value  \\\n",
       "State                                                                         \n",
       "Alabama     CATTLE, INCL CALVES - INVENTORY  TOTAL   NOT SPECIFIED  1260000   \n",
       "Alaska      CATTLE, INCL CALVES - INVENTORY  TOTAL   NOT SPECIFIED    18000   \n",
       "Arizona     CATTLE, INCL CALVES - INVENTORY  TOTAL   NOT SPECIFIED   960000   \n",
       "Arkansas    CATTLE, INCL CALVES - INVENTORY  TOTAL   NOT SPECIFIED  1690000   \n",
       "California  CATTLE, INCL CALVES - INVENTORY  TOTAL   NOT SPECIFIED  5200000   \n",
       "\n",
       "            CV (%)  \n",
       "State               \n",
       "Alabama        NaN  \n",
       "Alaska         NaN  \n",
       "Arizona        NaN  \n",
       "Arkansas       NaN  \n",
       "California     NaN  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# take a look at what the CSV file looks like\n",
    "cattle_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Cleaning the data**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like our CSV file has a few extra columns including Program, Commodity, Domain, etc. that we do not need. It also shows a few columns that have missing (NaN) values. Let's start out by removing all of the unnecessary columns and removing all of the NaNs. Let's also rename the 'Value' column to 'Cattle' to make it more clear. Finally, by using the `type()` function we can check to see that the 'Cattle' column is of dtype `str`. We need to convert this to an `int`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cattle</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>State</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Alabama</th>\n",
       "      <td>1260000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alaska</th>\n",
       "      <td>18000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arizona</th>\n",
       "      <td>960000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arkansas</th>\n",
       "      <td>1690000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>5200000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Cattle\n",
       "State              \n",
       "Alabama     1260000\n",
       "Alaska        18000\n",
       "Arizona      960000\n",
       "Arkansas    1690000\n",
       "California  5200000"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop the NaN values and unnecessary columns\n",
    "cattle_data = cattle_data.dropna(axis='columns')\n",
    "cattle_data = cattle_data.drop(columns=['Program', 'Year', 'Period', 'Geo Level', 'State ANSI', 'watershed_code', 'Commodity',\n",
    "                         'Data Item', 'Domain', 'Domain Category'])\n",
    "\n",
    "# rename the column from 'Value' to 'Cattle'\n",
    "cattle_data = cattle_data.rename(columns={'Value': 'Cattle'})\n",
    "\n",
    "# view the first five values\n",
    "cattle_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the total number of cattle for each state we could visualize this by creating a choropleth map\n",
    "that shows the variation in total cattle inventory by state. While this is interesting, it might not fully capture\n",
    "the variation we are looking for. For example, bigger states like California and Texas are likely to have the largest total\n",
    "number of cattle. One interesting metric we can use to compare the relative values of cattle across multiple states\n",
    "is by computing the cattle to person ratio. This allows us to examine a state's total inventory of cattle relative to\n",
    "its population."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this, we will use population data from the [United States Census Bureau's Population and Housing Unit Estimates](https://www.census.gov/programs-surveys/popest/data/data-sets.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read in the data\n",
    "population_data = pd.read_csv('Desktop/state_population_21.csv', index_col='State', thousands=',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>State</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Oklahoma</th>\n",
       "      <td>3986639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nebraska</th>\n",
       "      <td>1963692</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hawaii</th>\n",
       "      <td>1441553</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>South Dakota</th>\n",
       "      <td>895376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tennessee</th>\n",
       "      <td>6975218</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Population\n",
       "State                   \n",
       "Oklahoma         3986639\n",
       "Nebraska         1963692\n",
       "Hawaii           1441553\n",
       "South Dakota      895376\n",
       "Tennessee        6975218"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "population_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's merge these two datasets together."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cattle</th>\n",
       "      <th>Population</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>State</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Alabama</th>\n",
       "      <td>1260000</td>\n",
       "      <td>5039877</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alaska</th>\n",
       "      <td>18000</td>\n",
       "      <td>732673</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arizona</th>\n",
       "      <td>960000</td>\n",
       "      <td>7276316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arkansas</th>\n",
       "      <td>1690000</td>\n",
       "      <td>3025891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>5200000</td>\n",
       "      <td>39237836</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Cattle  Population\n",
       "State                          \n",
       "Alabama     1260000     5039877\n",
       "Alaska        18000      732673\n",
       "Arizona      960000     7276316\n",
       "Arkansas    1690000     3025891\n",
       "California  5200000    39237836"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "merged_df = pd.merge(cattle_data, population_data, on='State')\n",
    "merged_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyzing the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's compute the cattle to person ratio for each state and sort the list by descending values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "State\n",
       "South Dakota    4.244027\n",
       "Nebraska        3.462865\n",
       "North Dakota    2.387257\n",
       "Kansas          2.214966\n",
       "Wyoming         2.159629\n",
       "Montana         1.992265\n",
       "Idaho           1.341454\n",
       "Oklahoma        1.304357\n",
       "Iowa            1.205733\n",
       "Missouri        0.654974\n",
       "New Mexico      0.614402\n",
       "Wisconsin       0.593632\n",
       "Arkansas        0.558513\n",
       "Colorado        0.455948\n",
       "Kentucky        0.447954\n",
       "dtype: float64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compute the cattle to person ratio by dividing the Cattle column by the Population column\n",
    "cattle_ratio = merged_df['Cattle']/merged_df['Population']\n",
    "\n",
    "# sort the values to see which states have the highest Cattle to Person ratio\n",
    "sorted_cattle_ratio = cattle_ratio.sort_values(ascending=False)\n",
    "\n",
    "# view the first 15 values of the sorted pandas series\n",
    "sorted_cattle_ratio.head(15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is interesting! In fact, it turns out there are **nine states** where there are more cattle than people!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's append this as a third column to our original dataframe and round the values to three decimal places."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert the series containing the ratio to a dataframe and merge it with the original dataframe\n",
    "final_df = merged_df.merge(cattle_ratio.to_frame('Ratio'), on='State')\n",
    "\n",
    "# round the values of the ratio column to three decimal places\n",
    "final_df['Ratio'] = final_df['Ratio'].round(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cattle</th>\n",
       "      <th>Population</th>\n",
       "      <th>Ratio</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>State</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Alabama</th>\n",
       "      <td>1260000</td>\n",
       "      <td>5039877</td>\n",
       "      <td>0.250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alaska</th>\n",
       "      <td>18000</td>\n",
       "      <td>732673</td>\n",
       "      <td>0.025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arizona</th>\n",
       "      <td>960000</td>\n",
       "      <td>7276316</td>\n",
       "      <td>0.132</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arkansas</th>\n",
       "      <td>1690000</td>\n",
       "      <td>3025891</td>\n",
       "      <td>0.559</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>California</th>\n",
       "      <td>5200000</td>\n",
       "      <td>39237836</td>\n",
       "      <td>0.133</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td>2650000</td>\n",
       "      <td>5812069</td>\n",
       "      <td>0.456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Connecticut</th>\n",
       "      <td>47000</td>\n",
       "      <td>3605597</td>\n",
       "      <td>0.013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Delaware</th>\n",
       "      <td>12000</td>\n",
       "      <td>1003384</td>\n",
       "      <td>0.012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>1630000</td>\n",
       "      <td>21781128</td>\n",
       "      <td>0.075</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Georgia</th>\n",
       "      <td>1050000</td>\n",
       "      <td>10799566</td>\n",
       "      <td>0.097</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Cattle  Population  Ratio\n",
       "State                                  \n",
       "Alabama      1260000     5039877  0.250\n",
       "Alaska         18000      732673  0.025\n",
       "Arizona       960000     7276316  0.132\n",
       "Arkansas     1690000     3025891  0.559\n",
       "California   5200000    39237836  0.133\n",
       "Colorado     2650000     5812069  0.456\n",
       "Connecticut    47000     3605597  0.013\n",
       "Delaware       12000     1003384  0.012\n",
       "Florida      1630000    21781128  0.075\n",
       "Georgia      1050000    10799566  0.097"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have a dataframe containing the ratio of cattle inventory to population we are ready to use `geostates` to visualize it!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the data using geostates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first step for using the `geostates` package is to load in the geodataframe containing all of the state values. For this, we will use the `load_states()` function and assign it to a value `df`. Once we've loaded in the geodataframe we need to merge it with out cattle data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the load_states() function from the geostates package\n",
    "from geostates.shapefiles import load_states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATEFP</th>\n",
       "      <th>STATENS</th>\n",
       "      <th>AFFGEOID</th>\n",
       "      <th>GEOID</th>\n",
       "      <th>NAME</th>\n",
       "      <th>LSAD</th>\n",
       "      <th>ALAND</th>\n",
       "      <th>AWATER</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>STUSPS</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>MS</th>\n",
       "      <td>28</td>\n",
       "      <td>01779790</td>\n",
       "      <td>0400000US28</td>\n",
       "      <td>28</td>\n",
       "      <td>Mississippi</td>\n",
       "      <td>00</td>\n",
       "      <td>121533519481</td>\n",
       "      <td>3926919758</td>\n",
       "      <td>MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NC</th>\n",
       "      <td>37</td>\n",
       "      <td>01027616</td>\n",
       "      <td>0400000US37</td>\n",
       "      <td>37</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>00</td>\n",
       "      <td>125923656064</td>\n",
       "      <td>13466071395</td>\n",
       "      <td>MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OK</th>\n",
       "      <td>40</td>\n",
       "      <td>01102857</td>\n",
       "      <td>0400000US40</td>\n",
       "      <td>40</td>\n",
       "      <td>Oklahoma</td>\n",
       "      <td>00</td>\n",
       "      <td>177662925723</td>\n",
       "      <td>3374587997</td>\n",
       "      <td>POLYGON ((-103.00257 36.52659, -103.00219 36.6...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VA</th>\n",
       "      <td>51</td>\n",
       "      <td>01779803</td>\n",
       "      <td>0400000US51</td>\n",
       "      <td>51</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>102257717110</td>\n",
       "      <td>8528531774</td>\n",
       "      <td>MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WV</th>\n",
       "      <td>54</td>\n",
       "      <td>01779805</td>\n",
       "      <td>0400000US54</td>\n",
       "      <td>54</td>\n",
       "      <td>West Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>62266474513</td>\n",
       "      <td>489028543</td>\n",
       "      <td>POLYGON ((-82.64320 38.16909, -82.64300 38.169...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       STATEFP   STATENS     AFFGEOID GEOID            NAME LSAD  \\\n",
       "STUSPS                                                             \n",
       "MS          28  01779790  0400000US28    28     Mississippi   00   \n",
       "NC          37  01027616  0400000US37    37  North Carolina   00   \n",
       "OK          40  01102857  0400000US40    40        Oklahoma   00   \n",
       "VA          51  01779803  0400000US51    51        Virginia   00   \n",
       "WV          54  01779805  0400000US54    54   West Virginia   00   \n",
       "\n",
       "               ALAND       AWATER  \\\n",
       "STUSPS                              \n",
       "MS      121533519481   3926919758   \n",
       "NC      125923656064  13466071395   \n",
       "OK      177662925723   3374587997   \n",
       "VA      102257717110   8528531774   \n",
       "WV       62266474513    489028543   \n",
       "\n",
       "                                                 geometry  \n",
       "STUSPS                                                     \n",
       "MS      MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...  \n",
       "NC      MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...  \n",
       "OK      POLYGON ((-103.00257 36.52659, -103.00219 36.6...  \n",
       "VA      MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...  \n",
       "WV      POLYGON ((-82.64320 38.16909, -82.64300 38.169...  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load in the geodataframe and assign it to df\n",
    "df = load_states()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Merging the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to sucessfully create a choropleth map of the cattle data, we need to merge it with the geodataframe that contains all the information for creating the plots of the states. We can do this by using the `pandas merge` function. Since the index for the cattle data is `State` and our geodataframe contains a similar column (`NAME`) we can use this value to merge both dataframes. Let's start out by renaming the `NAME` column in our geodataframe to `State` so that the names of both columns match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>STATEFP</th>\n",
       "      <th>STATENS</th>\n",
       "      <th>AFFGEOID</th>\n",
       "      <th>GEOID</th>\n",
       "      <th>State</th>\n",
       "      <th>LSAD</th>\n",
       "      <th>ALAND</th>\n",
       "      <th>AWATER</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>STUSPS</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>MS</th>\n",
       "      <td>28</td>\n",
       "      <td>01779790</td>\n",
       "      <td>0400000US28</td>\n",
       "      <td>28</td>\n",
       "      <td>Mississippi</td>\n",
       "      <td>00</td>\n",
       "      <td>121533519481</td>\n",
       "      <td>3926919758</td>\n",
       "      <td>MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NC</th>\n",
       "      <td>37</td>\n",
       "      <td>01027616</td>\n",
       "      <td>0400000US37</td>\n",
       "      <td>37</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>00</td>\n",
       "      <td>125923656064</td>\n",
       "      <td>13466071395</td>\n",
       "      <td>MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OK</th>\n",
       "      <td>40</td>\n",
       "      <td>01102857</td>\n",
       "      <td>0400000US40</td>\n",
       "      <td>40</td>\n",
       "      <td>Oklahoma</td>\n",
       "      <td>00</td>\n",
       "      <td>177662925723</td>\n",
       "      <td>3374587997</td>\n",
       "      <td>POLYGON ((-103.00257 36.52659, -103.00219 36.6...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VA</th>\n",
       "      <td>51</td>\n",
       "      <td>01779803</td>\n",
       "      <td>0400000US51</td>\n",
       "      <td>51</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>102257717110</td>\n",
       "      <td>8528531774</td>\n",
       "      <td>MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WV</th>\n",
       "      <td>54</td>\n",
       "      <td>01779805</td>\n",
       "      <td>0400000US54</td>\n",
       "      <td>54</td>\n",
       "      <td>West Virginia</td>\n",
       "      <td>00</td>\n",
       "      <td>62266474513</td>\n",
       "      <td>489028543</td>\n",
       "      <td>POLYGON ((-82.64320 38.16909, -82.64300 38.169...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       STATEFP   STATENS     AFFGEOID GEOID           State LSAD  \\\n",
       "STUSPS                                                             \n",
       "MS          28  01779790  0400000US28    28     Mississippi   00   \n",
       "NC          37  01027616  0400000US37    37  North Carolina   00   \n",
       "OK          40  01102857  0400000US40    40        Oklahoma   00   \n",
       "VA          51  01779803  0400000US51    51        Virginia   00   \n",
       "WV          54  01779805  0400000US54    54   West Virginia   00   \n",
       "\n",
       "               ALAND       AWATER  \\\n",
       "STUSPS                              \n",
       "MS      121533519481   3926919758   \n",
       "NC      125923656064  13466071395   \n",
       "OK      177662925723   3374587997   \n",
       "VA      102257717110   8528531774   \n",
       "WV       62266474513    489028543   \n",
       "\n",
       "                                                 geometry  \n",
       "STUSPS                                                     \n",
       "MS      MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...  \n",
       "NC      MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...  \n",
       "OK      POLYGON ((-103.00257 36.52659, -103.00219 36.6...  \n",
       "VA      MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...  \n",
       "WV      POLYGON ((-82.64320 38.16909, -82.64300 38.169...  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rename the 'NAME' column in the geodataframe to 'State'\n",
    "geo_df = df.rename(columns={'NAME': 'State'})\n",
    "geo_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Important:** To make sure that we do not accidentally loose any important data during the merge, we need to make sure that we include the `how='outer'` parameter in the merge statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Cattle</th>\n",
       "      <th>Population</th>\n",
       "      <th>Ratio</th>\n",
       "      <th>STATEFP</th>\n",
       "      <th>STATENS</th>\n",
       "      <th>AFFGEOID</th>\n",
       "      <th>GEOID</th>\n",
       "      <th>LSAD</th>\n",
       "      <th>ALAND</th>\n",
       "      <th>AWATER</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>1260000.0</td>\n",
       "      <td>5039877.0</td>\n",
       "      <td>0.250</td>\n",
       "      <td>01</td>\n",
       "      <td>01779775</td>\n",
       "      <td>0400000US01</td>\n",
       "      <td>01</td>\n",
       "      <td>00</td>\n",
       "      <td>131174048583</td>\n",
       "      <td>4593327154</td>\n",
       "      <td>MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Alaska</td>\n",
       "      <td>18000.0</td>\n",
       "      <td>732673.0</td>\n",
       "      <td>0.025</td>\n",
       "      <td>02</td>\n",
       "      <td>01785533</td>\n",
       "      <td>0400000US02</td>\n",
       "      <td>02</td>\n",
       "      <td>00</td>\n",
       "      <td>1478839695958</td>\n",
       "      <td>245481577452</td>\n",
       "      <td>MULTIPOLYGON (((179.48246 51.98283, 179.48656 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>960000.0</td>\n",
       "      <td>7276316.0</td>\n",
       "      <td>0.132</td>\n",
       "      <td>04</td>\n",
       "      <td>01779777</td>\n",
       "      <td>0400000US04</td>\n",
       "      <td>04</td>\n",
       "      <td>00</td>\n",
       "      <td>294198551143</td>\n",
       "      <td>1027337603</td>\n",
       "      <td>POLYGON ((-114.81629 32.50804, -114.81432 32.5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Arkansas</td>\n",
       "      <td>1690000.0</td>\n",
       "      <td>3025891.0</td>\n",
       "      <td>0.559</td>\n",
       "      <td>05</td>\n",
       "      <td>00068085</td>\n",
       "      <td>0400000US05</td>\n",
       "      <td>05</td>\n",
       "      <td>00</td>\n",
       "      <td>134768872727</td>\n",
       "      <td>2962859592</td>\n",
       "      <td>POLYGON ((-94.61783 36.49941, -94.61765 36.499...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>California</td>\n",
       "      <td>5200000.0</td>\n",
       "      <td>39237836.0</td>\n",
       "      <td>0.133</td>\n",
       "      <td>06</td>\n",
       "      <td>01779778</td>\n",
       "      <td>0400000US06</td>\n",
       "      <td>06</td>\n",
       "      <td>00</td>\n",
       "      <td>403503931312</td>\n",
       "      <td>20463871877</td>\n",
       "      <td>MULTIPOLYGON (((-118.60442 33.47855, -118.5987...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        State     Cattle  Population  Ratio STATEFP   STATENS     AFFGEOID  \\\n",
       "0     Alabama  1260000.0   5039877.0  0.250      01  01779775  0400000US01   \n",
       "1      Alaska    18000.0    732673.0  0.025      02  01785533  0400000US02   \n",
       "2     Arizona   960000.0   7276316.0  0.132      04  01779777  0400000US04   \n",
       "3    Arkansas  1690000.0   3025891.0  0.559      05  00068085  0400000US05   \n",
       "4  California  5200000.0  39237836.0  0.133      06  01779778  0400000US06   \n",
       "\n",
       "  GEOID LSAD          ALAND        AWATER  \\\n",
       "0    01   00   131174048583    4593327154   \n",
       "1    02   00  1478839695958  245481577452   \n",
       "2    04   00   294198551143    1027337603   \n",
       "3    05   00   134768872727    2962859592   \n",
       "4    06   00   403503931312   20463871877   \n",
       "\n",
       "                                            geometry  \n",
       "0  MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ...  \n",
       "1  MULTIPOLYGON (((179.48246 51.98283, 179.48656 ...  \n",
       "2  POLYGON ((-114.81629 32.50804, -114.81432 32.5...  \n",
       "3  POLYGON ((-94.61783 36.49941, -94.61765 36.499...  \n",
       "4  MULTIPOLYGON (((-118.60442 33.47855, -118.5987...  "
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.merge(final_df, geo_df, on='State', how='outer')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To plot the data we need to use the `plot_states` function in the geostates package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the plot_states() function from geostates\n",
    "from geostates.plot import plot_states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a choropleth map that displays the cattle to person ratio for each state in the United States\n",
    "# plot = plot_states(data_2, column='Ratio', cmap=new_cmap, labels='both', linestyle='none', legend='colorbar',\n",
    "                   #bins=15)\n",
    "\n",
    "# add a title to the plot\n",
    "# plot.annotate('Cattle to Person Ratio 2022', xy=(-97, 50.5), fontsize=18, ha='center');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}